{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "gpuType": "T4"
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "**Code Credit: Hugging Face**\n",
        "\n",
        "**Dataset Credit: https://twitter.com/Dorialexander/status/1681671177696161794 **"
      ],
      "metadata": {
        "id": "YO20b0TD5D4w"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Finetune Llama-2-7b on a Google colab\n",
        "\n",
        "Welcome to this Google Colab notebook that shows how to fine-tune the recent Llama-2-7b model on a single Google colab and turn it into a chatbot\n",
        "\n",
        "We will leverage PEFT library from Hugging Face ecosystem, as well as QLoRA for more memory efficient finetuning"
      ],
      "metadata": {
        "id": "C2EgqEPDQ8v6"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Setup\n",
        "\n",
        "Run the cells below to setup and install the required libraries. For our experiment we will need `accelerate`, `peft`, `transformers`, `datasets` and TRL to leverage the recent [`SFTTrainer`](https://huggingface.co/docs/trl/main/en/sft_trainer). We will use `bitsandbytes` to [quantize the base model into 4bit](https://huggingface.co/blog/4bit-transformers-bitsandbytes). We will also install `einops` as it is a requirement to load Falcon models."
      ],
      "metadata": {
        "id": "i-tTvEF1RT3y"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mNnkgBq7Q3EU"
      },
      "outputs": [],
      "source": [
        "!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git\n",
        "!pip install -q datasets bitsandbytes einops wandb"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Dataset\n",
        "\n"
      ],
      "metadata": {
        "id": "Rnqmq7amRrU8"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "#dataset_name = \"timdettmers/openassistant-guanaco\" ###Human ,.,,,,,, ###Assistant\n",
        "\n",
        "dataset_name = 'AlexanderDoria/novel17_test' #french novels\n",
        "dataset = load_dataset(dataset_name, split=\"train\")"
      ],
      "metadata": {
        "id": "0X3kHnskSWU4"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Loading the model"
      ],
      "metadata": {
        "id": "rjOMoSbGSxx9"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import torch\n",
        "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer\n",
        "\n",
        "model_name = \"TinyPixel/Llama-2-7B-bf16-sharded\"\n",
        "\n",
        "bnb_config = BitsAndBytesConfig(\n",
        "    load_in_4bit=True,\n",
        "    bnb_4bit_quant_type=\"nf4\",\n",
        "    bnb_4bit_compute_dtype=torch.float16,\n",
        ")\n",
        "\n",
        "model = AutoModelForCausalLM.from_pretrained(\n",
        "    model_name,\n",
        "    quantization_config=bnb_config,\n",
        "    trust_remote_code=True\n",
        ")\n",
        "model.config.use_cache = False"
      ],
      "metadata": {
        "id": "ZwXZbQ2dSwzI"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's also load the tokenizer below"
      ],
      "metadata": {
        "id": "xNqIYtQcUBSm"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)\n",
        "tokenizer.pad_token = tokenizer.eos_token"
      ],
      "metadata": {
        "id": "XDS2yYmlUAD6"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from peft import LoraConfig, get_peft_model\n",
        "\n",
        "lora_alpha = 16\n",
        "lora_dropout = 0.1\n",
        "lora_r = 64\n",
        "\n",
        "peft_config = LoraConfig(\n",
        "    lora_alpha=lora_alpha,\n",
        "    lora_dropout=lora_dropout,\n",
        "    r=lora_r,\n",
        "    bias=\"none\",\n",
        "    task_type=\"CAUSAL_LM\"\n",
        ")"
      ],
      "metadata": {
        "id": "dQdvjTYTT1vQ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Loading the trainer"
      ],
      "metadata": {
        "id": "dzsYHLwIZoLm"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Here we will use the [`SFTTrainer` from TRL library](https://huggingface.co/docs/trl/main/en/sft_trainer) that gives a wrapper around transformers `Trainer` to easily fine-tune models on instruction based datasets using PEFT adapters. Let's first load the training arguments below."
      ],
      "metadata": {
        "id": "aTBJVE4PaJwK"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from transformers import TrainingArguments\n",
        "\n",
        "output_dir = \"./results\"\n",
        "per_device_train_batch_size = 4\n",
        "gradient_accumulation_steps = 4\n",
        "optim = \"paged_adamw_32bit\"\n",
        "save_steps = 100\n",
        "logging_steps = 10\n",
        "learning_rate = 2e-4\n",
        "max_grad_norm = 0.3\n",
        "max_steps = 100\n",
        "warmup_ratio = 0.03\n",
        "lr_scheduler_type = \"constant\"\n",
        "\n",
        "training_arguments = TrainingArguments(\n",
        "    output_dir=output_dir,\n",
        "    per_device_train_batch_size=per_device_train_batch_size,\n",
        "    gradient_accumulation_steps=gradient_accumulation_steps,\n",
        "    optim=optim,\n",
        "    save_steps=save_steps,\n",
        "    logging_steps=logging_steps,\n",
        "    learning_rate=learning_rate,\n",
        "    fp16=True,\n",
        "    max_grad_norm=max_grad_norm,\n",
        "    max_steps=max_steps,\n",
        "    warmup_ratio=warmup_ratio,\n",
        "    group_by_length=True,\n",
        "    lr_scheduler_type=lr_scheduler_type,\n",
        ")"
      ],
      "metadata": {
        "id": "OCFTvGW6aspE"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Then finally pass everthing to the trainer"
      ],
      "metadata": {
        "id": "I3t6b2TkcJwy"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from trl import SFTTrainer\n",
        "\n",
        "max_seq_length = 512\n",
        "\n",
        "trainer = SFTTrainer(\n",
        "    model=model,\n",
        "    train_dataset=dataset,\n",
        "    peft_config=peft_config,\n",
        "    dataset_text_field=\"text\",\n",
        "    max_seq_length=max_seq_length,\n",
        "    tokenizer=tokenizer,\n",
        "    args=training_arguments,\n",
        ")"
      ],
      "metadata": {
        "id": "TNeOBgZeTl2H"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We will also pre-process the model by upcasting the layer norms in float 32 for more stable training"
      ],
      "metadata": {
        "id": "GWplqqDjb3sS"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "for name, module in trainer.model.named_modules():\n",
        "    if \"norm\" in name:\n",
        "        module = module.to(torch.float32)"
      ],
      "metadata": {
        "id": "7OyIvEx7b1GT"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Train the model"
      ],
      "metadata": {
        "id": "1JApkSrCcL3O"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now let's train the model! Simply call `trainer.train()`"
      ],
      "metadata": {
        "id": "JjvisllacNZM"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "trainer.train()"
      ],
      "metadata": {
        "id": "_kbS7nRxcMt7"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "During training, the model should converge nicely as follows:\n",
        "\n",
        "![image](https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/loss-falcon-7b.png)\n",
        "\n",
        "The `SFTTrainer` also takes care of properly saving only the adapters during training instead of saving the entire model."
      ],
      "metadata": {
        "id": "H5c0ppfasK29"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model  # Take care of distributed/parallel training\n",
        "model_to_save.save_pretrained(\"outputs\")"
      ],
      "metadata": {
        "id": "v2CEgCF14M0m"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "lora_config = LoraConfig.from_pretrained('outputs')\n",
        "model = get_peft_model(model, lora_config)"
      ],
      "metadata": {
        "id": "qmA4G6C64dJ4"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "dataset['text']"
      ],
      "metadata": {
        "id": "yWEM89A48NrU"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "text = \"Écrire un texte dans un style baroque sur la glace et le feu ### Assistant: Si j'en luis éton\"\n",
        "device = \"cuda:0\"\n",
        "\n",
        "inputs = tokenizer(text, return_tensors=\"pt\").to(device)\n",
        "outputs = model.generate(**inputs, max_new_tokens=50)\n",
        "print(tokenizer.decode(outputs[0], skip_special_tokens=True))"
      ],
      "metadata": {
        "id": "pgt86z-x4diG"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from huggingface_hub import login\n",
        "login()"
      ],
      "metadata": {
        "id": "niyf5_Kc4ugO"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "model.push_to_hub(\"llama2-qlora-finetunined-french\")"
      ],
      "metadata": {
        "id": "fjOInz-Q-54k"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "cUmT5MRD_D9k"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}